Estimating evolutionary distances between genomic sequences from spaced-word matches.
Identifieur interne : 000087 ( France/Analysis ); précédent : 000086; suivant : 000088Estimating evolutionary distances between genomic sequences from spaced-word matches.
Auteurs : Burkhard Morgenstern [France] ; Bingyao Zhu [Allemagne] ; Sebastian Horwege [Allemagne] ; Chris André Leimeister [Allemagne]Source :
- Algorithms for molecular biology : AMB [ 1748-7188 ] ; 2015.
Abstract
Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d N of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'. Our software is available online and as downloadable source code at: http://spaced.gobics.de/.
DOI: 10.1186/s13015-015-0032-x
PubMed: 25685176
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 001700
- to stream PubMed, to step Curation: 001700
- to stream PubMed, to step Checkpoint: 001559
- to stream Ncbi, to step Merge: 001041
- to stream Ncbi, to step Curation: 001041
- to stream Ncbi, to step Checkpoint: 001041
- to stream Main, to step Merge: 001809
- to stream Main, to step Curation: 001804
- to stream Main, to step Exploration: 001804
- to stream France, to step Extraction: 000087
Links to Exploration step
pubmed:25685176Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Estimating evolutionary distances between genomic sequences from spaced-word matches.</title>
<author><name sortKey="Morgenstern, Burkhard" sort="Morgenstern, Burkhard" uniqKey="Morgenstern B" first="Burkhard" last="Morgenstern">Burkhard Morgenstern</name>
<affiliation wicri:level="3"><nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany ; Université d'Evry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA 23 Boulevard de France, Evry, 91037 France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany ; Université d'Evry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA 23 Boulevard de France, Evry</wicri:regionArea>
<placeName><region type="region">Île-de-France</region>
<region type="old region">Île-de-France</region>
<settlement type="city">Évry (Essonne)</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Zhu, Bingyao" sort="Zhu, Bingyao" uniqKey="Zhu B" first="Bingyao" last="Zhu">Bingyao Zhu</name>
<affiliation wicri:level="3"><nlm:affiliation>University of Göttingen, Department of General Microbiology, Grisebachstr. 8, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of General Microbiology, Grisebachstr. 8, Göttingen</wicri:regionArea>
<placeName><region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Horwege, Sebastian" sort="Horwege, Sebastian" uniqKey="Horwege S" first="Sebastian" last="Horwege">Sebastian Horwege</name>
<affiliation wicri:level="3"><nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen</wicri:regionArea>
<placeName><region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Leimeister, Chris Andre" sort="Leimeister, Chris Andre" uniqKey="Leimeister C" first="Chris André" last="Leimeister">Chris André Leimeister</name>
<affiliation wicri:level="3"><nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen</wicri:regionArea>
<placeName><region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2015">2015</date>
<idno type="RBID">pubmed:25685176</idno>
<idno type="pmid">25685176</idno>
<idno type="doi">10.1186/s13015-015-0032-x</idno>
<idno type="wicri:Area/PubMed/Corpus">001700</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001700</idno>
<idno type="wicri:Area/PubMed/Curation">001700</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001700</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001559</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001559</idno>
<idno type="wicri:Area/Ncbi/Merge">001041</idno>
<idno type="wicri:Area/Ncbi/Curation">001041</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001041</idno>
<idno type="wicri:doubleKey">1748-7188:2015:Morgenstern B:estimating:evolutionary:distances</idno>
<idno type="wicri:Area/Main/Merge">001809</idno>
<idno type="wicri:Area/Main/Curation">001804</idno>
<idno type="wicri:Area/Main/Exploration">001804</idno>
<idno type="wicri:Area/France/Extraction">000087</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Estimating evolutionary distances between genomic sequences from spaced-word matches.</title>
<author><name sortKey="Morgenstern, Burkhard" sort="Morgenstern, Burkhard" uniqKey="Morgenstern B" first="Burkhard" last="Morgenstern">Burkhard Morgenstern</name>
<affiliation wicri:level="3"><nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany ; Université d'Evry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA 23 Boulevard de France, Evry, 91037 France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany ; Université d'Evry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA 23 Boulevard de France, Evry</wicri:regionArea>
<placeName><region type="region">Île-de-France</region>
<region type="old region">Île-de-France</region>
<settlement type="city">Évry (Essonne)</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Zhu, Bingyao" sort="Zhu, Bingyao" uniqKey="Zhu B" first="Bingyao" last="Zhu">Bingyao Zhu</name>
<affiliation wicri:level="3"><nlm:affiliation>University of Göttingen, Department of General Microbiology, Grisebachstr. 8, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of General Microbiology, Grisebachstr. 8, Göttingen</wicri:regionArea>
<placeName><region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Horwege, Sebastian" sort="Horwege, Sebastian" uniqKey="Horwege S" first="Sebastian" last="Horwege">Sebastian Horwege</name>
<affiliation wicri:level="3"><nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen</wicri:regionArea>
<placeName><region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Leimeister, Chris Andre" sort="Leimeister, Chris Andre" uniqKey="Leimeister C" first="Chris André" last="Leimeister">Chris André Leimeister</name>
<affiliation wicri:level="3"><nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen</wicri:regionArea>
<placeName><region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">Algorithms for molecular biology : AMB</title>
<idno type="ISSN">1748-7188</idno>
<imprint><date when="2015" type="published">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d N of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'. Our software is available online and as downloadable source code at: http://spaced.gobics.de/. </div>
</front>
</TEI>
<affiliations><list><country><li>Allemagne</li>
<li>France</li>
</country>
<region><li>Basse-Saxe</li>
<li>Île-de-France</li>
</region>
<settlement><li>Göttingen</li>
<li>Évry (Essonne)</li>
</settlement>
</list>
<tree><country name="France"><region name="Île-de-France"><name sortKey="Morgenstern, Burkhard" sort="Morgenstern, Burkhard" uniqKey="Morgenstern B" first="Burkhard" last="Morgenstern">Burkhard Morgenstern</name>
</region>
</country>
<country name="Allemagne"><region name="Basse-Saxe"><name sortKey="Zhu, Bingyao" sort="Zhu, Bingyao" uniqKey="Zhu B" first="Bingyao" last="Zhu">Bingyao Zhu</name>
</region>
<name sortKey="Horwege, Sebastian" sort="Horwege, Sebastian" uniqKey="Horwege S" first="Sebastian" last="Horwege">Sebastian Horwege</name>
<name sortKey="Leimeister, Chris Andre" sort="Leimeister, Chris Andre" uniqKey="Leimeister C" first="Chris André" last="Leimeister">Chris André Leimeister</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000087 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000087 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= France |étape= Analysis |type= RBID |clé= pubmed:25685176 |texte= Estimating evolutionary distances between genomic sequences from spaced-word matches. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/France/Analysis/RBID.i -Sk "pubmed:25685176" \ | HfdSelect -Kh $EXPLOR_AREA/Data/France/Analysis/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |